Skip to content

[Anthropic] Support system role messages inside messages array#44283

Merged
sfeng33 merged 1 commit into
vllm-project:mainfrom
chaunceyjiang:anthropic_system_messages
Jun 2, 2026
Merged

[Anthropic] Support system role messages inside messages array#44283
sfeng33 merged 1 commit into
vllm-project:mainfrom
chaunceyjiang:anthropic_system_messages

Conversation

@chaunceyjiang

@chaunceyjiang chaunceyjiang commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Purpose

[Anthropic] Support system role messages inside messages array

FIX #44000

Test Result

before
image

after
image


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Co-Authored-By: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-Authored-By: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@aleksandaryanakiev

Copy link
Copy Markdown
Contributor

This looks better, I'm closing my PR as it's not needed anymore

@chaunceyjiang

Copy link
Copy Markdown
Collaborator Author

/cc @DarkLight1337 @sfeng33 PTAL.

@sfeng33 sfeng33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@sfeng33 sfeng33 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2026
@sfeng33 sfeng33 enabled auto-merge (squash) June 2, 2026 16:20
@sfeng33 sfeng33 merged commit ed9a752 into vllm-project:main Jun 2, 2026
51 checks passed
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
system_parts.append(block.text)

# System messages embedded inside the messages array
for msg in anthropic_request.messages:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang @aleksandaryanakiev @sfeng33 @andrew @potatosalad I'm a bit concerned about the system role fix. It seems like merging a mid-conversation system:role message into a single system message could cause issues with KV-cache hits. In multi-turn conversations, this would likely change the prefix, potentially hurting cache reuse.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have also observed this issue. The fix here is not correct. I am trying a new solution.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang
OK, I also have an idea here. Later, I will prepare a Merge Request for you. You can check if it meets your requirements.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

felix0080 added a commit to felix0080/vllm that referenced this pull request Jun 5, 2026
…ching

PR vllm-project#44283 merged all inline system:role messages into a single leading
system message, which changes the conversation prefix and breaks
KV-cache hits in multi-turn dialogues.

This fix keeps inline system messages at their original position:

- Remove inline system extraction from _convert_system_message (only
  top-level system is handled there)
- In _convert_messages, handle system messages with a dedicated
  _extract_system_text helper that strips billing headers and only
  emits the message if real content exists — avoiding the
  _convert_block / _convert_message_content path which does not strip
  billing headers and may omit the "content" key
- Add tests for billing header stripping on inline system messages

Unlike vllm-project#44048 which moves the same merge logic to the protocol layer,
this approach fundamentally avoids the prefix-breaking merge entirely.

Co-authored-by: Hermes Agent
@felix0080

felix0080 commented Jun 5, 2026

Copy link
Copy Markdown

I noticed the prefix caching concern discussed here. I opened #44602 with an alternative approach that preserves inline role: system messages at their original position instead of merging them into the leading system message, so the conversation prefix structure stays intact for KV-cache hits. This also handles x-anthropic-billing-header stripping consistently for both top-level and inline system messages. @chaunceyjiang

felix0080 added a commit to felix0080/vllm that referenced this pull request Jun 5, 2026
…ching

PR vllm-project#44283 merged all inline system:role messages into a single leading
system message, which changes the conversation prefix and breaks
KV-cache hits in multi-turn dialogues.

This fix keeps inline system messages at their original position:

- Remove inline system extraction from _convert_system_message (only
  top-level system is handled there)
- In _convert_messages, handle system messages with a dedicated
  _extract_system_text helper that strips billing headers and only
  emits the message if real content exists — avoiding the
  _convert_block / _convert_message_content path which does not strip
  billing headers and may omit the "content" key
- Add tests for billing header stripping on inline system messages

Unlike vllm-project#44048 which moves the same merge logic to the protocol layer,
this approach fundamentally avoids the prefix-breaking merge entirely.

Co-authored-by: Hermes Agent
Signed-off-by: felix0080 <felix0080@users.noreply.github.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: JisoLya <523420504@qq.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
@huangyxi

huangyxi commented Jun 8, 2026

Copy link
Copy Markdown

Hello! I found that the current Anthropic streaming response does not include the "assistant" role, which does not align with the official Anthropic response format. As a result, some third-party applications fail to parse the response correctly. For example, this issue occurs in CloudCLI:

https://github.com/siteboon/claudecodeui/blob/dd77649053769b91886897ec64fbd1c15e7a7a75/server/modules/providers/list/claude/claude-sessions.provider.ts#L493

I suggest adding

role="assistant"

between the two lines below. I tested this change locally, and it fixed the issue without affecting the normal Claude Code experience:

id=origin_chunk.id,
content=[],

Since I'm not familiar with the vLLM contribution process, and the issue I found appears to be related to this existing PR, I hope someone from the community can help implement or review this simple fix.

realliujiaxu pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jun 9, 2026
## What this PR does / why we need it?

Backports vLLM PR #44283 via a vllm-ascend platform monkey patch for the
pinned /vllm-workspace/vllm runtime.

The patch accepts `role: system` entries in Anthropic Messages API
`messages`, merges inline system content with the top-level `system`
prompt, strips Claude Code billing headers in both places, and skips
inline system entries when converting the remaining chat history.

Fixes vllm-project/vllm#44000
Backports vllm-project/vllm#44283

## Does this PR introduce _any_ user-facing change?

Yes. Anthropic-compatible `/v1/messages` requests from newer Claude Code
clients can include `role: system` messages inside the `messages` array
without failing validation.

## How was this patch tested?

- `pytest -q
tests/ut/patch/platform/test_patch_anthropic_system_message.py`
- `ruff check
vllm_ascend/patch/platform/patch_anthropic_system_message.py
tests/ut/patch/platform/test_patch_anthropic_system_message.py
vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py`
- `ruff format --check
vllm_ascend/patch/platform/patch_anthropic_system_message.py
tests/ut/patch/platform/test_patch_anthropic_system_message.py
vllm_ascend/patch/platform/__init__.py vllm_ascend/patch/__init__.py`

- vLLM version: v0.20.2
- vLLM main:
vllm-project/vllm@9090368

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
…project#44283)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Aleksandar Yanakiev <alexander.yanakiev@discretestack.com>
Co-authored-by: Ang Kah Min, Kelvin <syraxius@hotmail.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Code CLI >= 2.1.154 sends ctx/msg/system roles and breaks vLLM Anthropic Messages API validation

5 participants